{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# ·报告二：泰坦尼克号乘客幸存情况分析与预测 \n",
    "\n",
    "### 姓名：黄红亮\n",
    "\n",
    "### 学号：2021260165\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "数据预测 线性回归 随机森林"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 题目翻译"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "泰坦尼克号沉船是历史上最著名的沉船之一。在第一次航行中，泰坦尼克号撞上冰山沉没，2224名乘客和船员中有1502人死亡。这场耸人听闻的悲剧震惊了国际社会，并导致了对船舶安全的更好监管。这次海难造成如此多人死亡的原因之一是没有足够的救生艇供乘客和船员使用。虽然有很多因素影响沉船的生存机会，但有些人比其他人更容易生存，比如妇女、儿童和上层阶级。训练集应用于构建机器学习模型。我们还可以根据每位乘客的性别为每位乘客创建一个新的功能集（也称为我们的实时功能集），该功能集也可用于您的实时培训。应该使用测试集来查看模型对不可见数据的行为。对于测试集，我们不指定每位乘客的实际情况。预测这些结果是你的工作。对测试集中的每位乘客使用经过训练的模型，预测他们是否在泰坦尼克号沉没后幸存下来。在这项挑战中，我们要求你们完成对可能存活下来的人的分析。特别地，我们要求你使用机器学习来预测哪些乘客在悲剧中幸存。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 数据介绍\n",
    "    "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " * 训练集（train.csv）：共包含891条乘客数据。训练集应用于构建和训练机器学习模型。对于训练集，没有提供每位乘客的全部基本情况，存在数据缺失，我们为每位乘客提供幸存结果（即是否幸存）。所建立的模型将基于训练集中乘客性别和等级等基本特征与是否幸存之间的相关性。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " * 测试集（test.csv）：共包含418条乘客数据。对于测试集，没有提供每位乘客的全部基本情况，也存在着数据缺失的情况。需要补全测试集乘客的基本情况，根据训练集所建立的机器学习模型，做出这些乘客是否幸存的预测。\n",
    "   "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "  * gender_submission.csv：这是一组假设所有且只有女性乘客存活的预测，作为提交文件的示例。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 目标"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "根据训练集的信息与在泰坦尼克号沉没过程中的幸存情况预测测试集中每名乘客是否能够幸免于泰坦尼克号沉没。对于测试集中的每名乘客，预测其生存情况；1代表沉没过程得以幸存，0代表沉没过程没能幸存。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 主要思路"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "本次主要面临着三个问题 1.观察数据可以发现，数据集中缺失了大量的数据，首先要做的就是对数据集缺失的部分进行补充。 2.数据中有许多文字类数据，比如name，cabin等，需要将它们变换为数字特征。 3.对处理过的数据进行训练，并代入测试集进行预测。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 应用的两个模型"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1. 线性回归模型"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2. 随机森林模型"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 导入数据库"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 导入数据库\n",
    "from sklearn.model_selection import cross_val_score\n",
    "import re\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "训练集的大小: (891, 12)\n",
      "测试集的大小: (418, 11)\n"
     ]
    }
   ],
   "source": [
    "# 读取数据\n",
    "train_data = pd.read_csv('D:\\\\gitee\\\\AI_HW_2021260165-HHLK\\\\report_02_Titanic\\\\data\\\\train.csv')\n",
    "test_data = pd.read_csv('D:\\\\gitee\\\\AI_HW_2021260165-HHLK\\\\report_02_Titanic\\\\data\\\\test.csv')\n",
    "\n",
    "# 查看训练数据集与测试数据集的规模\n",
    "print('训练集的大小:',train_data.shape)\n",
    "print('测试集的大小:',test_data.shape)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>PassengerId</th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Embarked</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Braund, Mr. Owen Harris</td>\n",
       "      <td>male</td>\n",
       "      <td>22.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>A/5 21171</td>\n",
       "      <td>7.2500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Cumings, Mrs. John Bradley (Florence Briggs Th...</td>\n",
       "      <td>female</td>\n",
       "      <td>38.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>PC 17599</td>\n",
       "      <td>71.2833</td>\n",
       "      <td>C85</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>Heikkinen, Miss. Laina</td>\n",
       "      <td>female</td>\n",
       "      <td>26.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>STON/O2. 3101282</td>\n",
       "      <td>7.9250</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Futrelle, Mrs. Jacques Heath (Lily May Peel)</td>\n",
       "      <td>female</td>\n",
       "      <td>35.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>113803</td>\n",
       "      <td>53.1000</td>\n",
       "      <td>C123</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Allen, Mr. William Henry</td>\n",
       "      <td>male</td>\n",
       "      <td>35.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>373450</td>\n",
       "      <td>8.0500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   PassengerId  Survived  Pclass  \\\n",
       "0            1         0       3   \n",
       "1            2         1       1   \n",
       "2            3         1       3   \n",
       "3            4         1       1   \n",
       "4            5         0       3   \n",
       "\n",
       "                                                Name     Sex   Age  SibSp  \\\n",
       "0                            Braund, Mr. Owen Harris    male  22.0      1   \n",
       "1  Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0      1   \n",
       "2                             Heikkinen, Miss. Laina  female  26.0      0   \n",
       "3       Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1   \n",
       "4                           Allen, Mr. William Henry    male  35.0      0   \n",
       "\n",
       "   Parch            Ticket     Fare Cabin Embarked  \n",
       "0      0         A/5 21171   7.2500   NaN        S  \n",
       "1      0          PC 17599  71.2833   C85        C  \n",
       "2      0  STON/O2. 3101282   7.9250   NaN        S  \n",
       "3      0            113803  53.1000  C123        S  \n",
       "4      0            373450   8.0500   NaN        S  "
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "train_data.head()  # 查看训练数据开头前5行内容"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 891 entries, 0 to 890\n",
      "Data columns (total 12 columns):\n",
      " #   Column       Non-Null Count  Dtype  \n",
      "---  ------       --------------  -----  \n",
      " 0   PassengerId  891 non-null    int64  \n",
      " 1   Survived     891 non-null    int64  \n",
      " 2   Pclass       891 non-null    int64  \n",
      " 3   Name         891 non-null    object \n",
      " 4   Sex          891 non-null    object \n",
      " 5   Age          714 non-null    float64\n",
      " 6   SibSp        891 non-null    int64  \n",
      " 7   Parch        891 non-null    int64  \n",
      " 8   Ticket       891 non-null    object \n",
      " 9   Fare         891 non-null    float64\n",
      " 10  Cabin        204 non-null    object \n",
      " 11  Embarked     889 non-null    object \n",
      "dtypes: float64(2), int64(5), object(5)\n",
      "memory usage: 83.7+ KB\n",
      "None\n",
      "________________________________________\n",
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 418 entries, 0 to 417\n",
      "Data columns (total 11 columns):\n",
      " #   Column       Non-Null Count  Dtype  \n",
      "---  ------       --------------  -----  \n",
      " 0   PassengerId  418 non-null    int64  \n",
      " 1   Pclass       418 non-null    int64  \n",
      " 2   Name         418 non-null    object \n",
      " 3   Sex          418 non-null    object \n",
      " 4   Age          332 non-null    float64\n",
      " 5   SibSp        418 non-null    int64  \n",
      " 6   Parch        418 non-null    int64  \n",
      " 7   Ticket       418 non-null    object \n",
      " 8   Fare         417 non-null    float64\n",
      " 9   Cabin        91 non-null     object \n",
      " 10  Embarked     418 non-null    object \n",
      "dtypes: float64(2), int64(4), object(5)\n",
      "memory usage: 36.0+ KB\n",
      "None\n"
     ]
    }
   ],
   "source": [
    "print(train_data.info())  # 显示训练数据的所有信息\n",
    "print('_'*40)             # 分隔符\n",
    "print(test_data.info())   # 显示测试数据的所有信息"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**变量说明**\n",
    "\n",
    "* Passengerid：乘客编号\n",
    "* Survived：是否生还（生还：1；未生还：0）\n",
    "* Pclass：船舱等级（1：一等舱；2：二等舱；3：三等舱）\n",
    "* Name：乘客姓名\n",
    "* Sex：性别\n",
    "* Age：年龄\n",
    "* SibSp：兄弟姐妹的数量\n",
    "* Parch：父母子女的数量\n",
    "* Ticket：船票编号\n",
    "* Fare：船票价格\n",
    "* Cabin：船舱编号\n",
    "* Embarked：登陆港口"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "       PassengerId    Survived      Pclass         Age       SibSp  \\\n",
      "count   891.000000  891.000000  891.000000  714.000000  891.000000   \n",
      "mean    446.000000    0.383838    2.308642   29.699118    0.523008   \n",
      "std     257.353842    0.486592    0.836071   14.526497    1.102743   \n",
      "min       1.000000    0.000000    1.000000    0.420000    0.000000   \n",
      "25%     223.500000    0.000000    2.000000   20.125000    0.000000   \n",
      "50%     446.000000    0.000000    3.000000   28.000000    0.000000   \n",
      "75%     668.500000    1.000000    3.000000   38.000000    1.000000   \n",
      "max     891.000000    1.000000    3.000000   80.000000    8.000000   \n",
      "\n",
      "            Parch        Fare  \n",
      "count  891.000000  891.000000  \n",
      "mean     0.381594   32.204208  \n",
      "std      0.806057   49.693429  \n",
      "min      0.000000    0.000000  \n",
      "25%      0.000000    7.910400  \n",
      "50%      0.000000   14.454200  \n",
      "75%      0.000000   31.000000  \n",
      "max      6.000000  512.329200  \n"
     ]
    }
   ],
   "source": [
    "# 首先查看一下训练集的数据信息\n",
    "#titanic.info()\n",
    "print(train_data.describe())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Age那列只有714个数。而其他属性都有891个数值。那说明，Age那列有空值，就是数据丢失了，一共有接近200个数据丢失。\n",
    "\n",
    "那怎么办？因为我们一开始分析了，Age这一列对生存率分析有影响，我们必须保留，不能忽略。所以我选择补充上去。\n",
    "\n",
    "那么问题来了，补充什么样的数据比较合理呢？才不会导致数据的准确性呢？我想应该补充平均值。因此我考虑对缺失数据进行平均值填充\n",
    "平均值填充（Mean/Mode Completer）\n",
    "将初始数据集中的属性分为数值属性和非数值属性来分别进行处理。\n",
    "如果空值是数值型的，就根据该属性在其他所有对象的取值的平均值来填充该缺失的属性值； 如果空值是非数值型的，就根据统计学中的众数原理，用该属性在其他所有对象的取值次数最多的值(即出现频率最高的值)来补齐该缺失的属性值。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "       PassengerId    Survived      Pclass         Age       SibSp  \\\n",
      "count   891.000000  891.000000  891.000000  891.000000  891.000000   \n",
      "mean    446.000000    0.383838    2.308642   29.361582    0.523008   \n",
      "std     257.353842    0.486592    0.836071   13.019697    1.102743   \n",
      "min       1.000000    0.000000    1.000000    0.420000    0.000000   \n",
      "25%     223.500000    0.000000    2.000000   22.000000    0.000000   \n",
      "50%     446.000000    0.000000    3.000000   28.000000    0.000000   \n",
      "75%     668.500000    1.000000    3.000000   35.000000    1.000000   \n",
      "max     891.000000    1.000000    3.000000   80.000000    8.000000   \n",
      "\n",
      "            Parch        Fare  \n",
      "count  891.000000  891.000000  \n",
      "mean     0.381594   32.204208  \n",
      "std      0.806057   49.693429  \n",
      "min      0.000000    0.000000  \n",
      "25%      0.000000    7.910400  \n",
      "50%      0.000000   14.454200  \n",
      "75%      0.000000   31.000000  \n",
      "max      6.000000  512.329200  \n"
     ]
    }
   ],
   "source": [
    "#机器学习中输入的数据集要求是M*N的矩阵，Age列有缺失值，median（）函数代表去均值\n",
    "\n",
    "train_data['Age']=train_data['Age'].fillna(train_data['Age'].median()) #fillna（）表示补充，median（）表示求平均值\n",
    "\n",
    "print(train_data.describe()) #再看看表的变化\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "这里我们可以看见，Age这一列补充了数值了，到达891个。这列数据算是处理完了。\n",
    "\n",
    "接下来，该看看我们刚刚分析其他列的数据有什么需要改进的地方。\n",
    "\n",
    "我们可以看到Sex这一列的数据显示是为：female和male，下面可以输入代码可以显示出来："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 81,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['male' 'female']\n"
     ]
    }
   ],
   "source": [
    "print(train_data['Sex'].unique())  #返回其参数数组中所有不同的值，并且按照从小到大的顺序排列"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "这里把\"male\"和“female”进行处理，分别用0和1替代"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "train_data.loc[train_data['Sex']=='male','Sex']=0\n",
    "\n",
    "train_data.loc[train_data['Sex']=='female','Sex']=1\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "同时，把\"Embarked\"这一列数据进行同样的处理："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['S' 'C' 'Q' nan]\n"
     ]
    }
   ],
   "source": [
    "print(train_data['Embarked'].unique())\n",
    "\n",
    "train_data['Embarked']=train_data['Embarked'].fillna('S')\n",
    "\n",
    "train_data.loc[train_data['Embarked']=='S','Embarked']=0\n",
    "\n",
    "train_data.loc[train_data['Embarked']=='C','Embarked']=1\n",
    "\n",
    "train_data.loc[train_data['Embarked']=='Q','Embarked']=2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>PassengerId</th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Embarked</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Braund, Mr. Owen Harris</td>\n",
       "      <td>0</td>\n",
       "      <td>22.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>A/5 21171</td>\n",
       "      <td>7.2500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Cumings, Mrs. John Bradley (Florence Briggs Th...</td>\n",
       "      <td>1</td>\n",
       "      <td>38.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>PC 17599</td>\n",
       "      <td>71.2833</td>\n",
       "      <td>C85</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>Heikkinen, Miss. Laina</td>\n",
       "      <td>1</td>\n",
       "      <td>26.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>STON/O2. 3101282</td>\n",
       "      <td>7.9250</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Futrelle, Mrs. Jacques Heath (Lily May Peel)</td>\n",
       "      <td>1</td>\n",
       "      <td>35.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>113803</td>\n",
       "      <td>53.1000</td>\n",
       "      <td>C123</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Allen, Mr. William Henry</td>\n",
       "      <td>0</td>\n",
       "      <td>35.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>373450</td>\n",
       "      <td>8.0500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   PassengerId  Survived  Pclass  \\\n",
       "0            1         0       3   \n",
       "1            2         1       1   \n",
       "2            3         1       3   \n",
       "3            4         1       1   \n",
       "4            5         0       3   \n",
       "\n",
       "                                                Name Sex   Age  SibSp  Parch  \\\n",
       "0                            Braund, Mr. Owen Harris   0  22.0      1      0   \n",
       "1  Cumings, Mrs. John Bradley (Florence Briggs Th...   1  38.0      1      0   \n",
       "2                             Heikkinen, Miss. Laina   1  26.0      0      0   \n",
       "3       Futrelle, Mrs. Jacques Heath (Lily May Peel)   1  35.0      1      0   \n",
       "4                           Allen, Mr. William Henry   0  35.0      0      0   \n",
       "\n",
       "             Ticket     Fare Cabin Embarked  \n",
       "0         A/5 21171   7.2500   NaN        0  \n",
       "1          PC 17599  71.2833   C85        1  \n",
       "2  STON/O2. 3101282   7.9250   NaN        0  \n",
       "3            113803  53.1000  C123        0  \n",
       "4            373450   8.0500   NaN        0  "
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "train_data.loc[train_data[\"Sex\"]==\"male\",\"Sex\"] = 0; \n",
    "train_data.loc[train_data[\"Sex\"]==\"female\",\"Sex\"] = 1;\n",
    "train_data.loc[train_data[\"Embarked\"]==\"S\",\"Embarked\"] = 0; \n",
    "train_data.loc[train_data[\"Embarked\"]==\"C\",\"Embarked\"] = 1;\n",
    "train_data.loc[train_data[\"Embarked\"]==\"Q\",\"Embarked\"] = 2;\n",
    "train_data.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "替换成功"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 3：分析数据间的相关性"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<AxesSubplot:xlabel='Sex'>"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXQAAAEDCAYAAAAlRP8qAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8QVMy6AAAACXBIWXMAAAsTAAALEwEAmpwYAAAOkklEQVR4nO3db4ydaV2H8evLLI0iEdSOsvYPbaSwKQYIjEUSCBCzobtgChFDFyNBxEmN1WCiocYEE/EFGxKDSHHSkIb4hmoCwgQGKpKwqECcWbNu6K7FSQU6FsMsEGCRULr8fDEHPZw9Z84z7Zme7d3rk0xynue5c+b3onvtk3vOn1QVkqQb3+OmPYAkaTIMuiQ1wqBLUiMMuiQ1wqBLUiMMuiQ14pZp/eKdO3fWvn37pvXrJemGdO+99z5UVbPDrk0t6Pv27WNlZWVav16SbkhJvjjqmlsuktQIgy5JjTDoktQIgy5JjTDoktQIgy5JjTDoktQIgy5JjZjaG4skXZt9Jz4y7RGa8oW3vXzaI1wz79AlqREGXZIaYdAlqREGXZIaYdAlqREGXZIaYdAlqRGdgp7kcJLzSVaTnBhy/Q+T3Nf7+VySR5L85OTHlSSNMjboSWaAk8AdwEHgriQH+9dU1dur6jlV9Rzgj4B7qupr2zCvJGmELnfoh4DVqrpQVZeBM8CRTdbfBbxvEsNJkrrrEvRdwMW+47XeuUdJ8gTgMPD+ax9NkrQVXYKeIedqxNpfBv551HZLkvkkK0lW1tfXu84oSeqgS9DXgD19x7uBSyPWHmWT7ZaqOlVVc1U1Nzs7231KSdJYXYK+DBxIsj/JDjaivTi4KMmTgBcDH5rsiJKkLsZ+fG5VXUlyHDgLzACnq+pckmO96wu9pa8C/r6qvr1t00qSRur0eehVtQQsDZxbGDh+L/DeSQ0mSdoa3ykqSY0w6JLUCIMuSY0w6JLUCIMuSY0w6JLUCIMuSY0w6JLUCIMuSY0w6JLUCIMuSY0w6JLUCIMuSY0w6JLUCIMuSY0w6JLUCIMuSY0w6JLUCIMuSY3oFPQkh5OcT7Ka5MSINS9Jcl+Sc0numeyYkqRxxn5JdJIZ4CRwO7AGLCdZrKoH+tY8GXg3cLiqvpTkp7dpXknSCF3u0A8Bq1V1oaouA2eAIwNrXgt8oKq+BFBVX5nsmJKkcboEfRdwse94rXeu39OBn0jyyST3JnndpAaUJHUzdssFyJBzNeR5ngf8EvCjwGeSfLaqPv9DT5TMA/MAe/fu3fq0kqSRutyhrwF7+o53A5eGrPlYVX27qh4CPgU8e/CJqupUVc1V1dzs7OzVzixJGqJL0JeBA0n2J9kBHAUWB9Z8CHhRkluSPAF4PvDgZEeVJG1m7JZLVV1Jchw4C8wAp6vqXJJjvesLVfVgko8B9wPfB95TVZ/bzsElST+syx46VbUELA2cWxg4fjvw9smNJknaCt8pKkmNMOiS1AiDLkmNMOiS1AiDLkmNMOiS1AiDLkmNMOiS1AiDLkmNMOiS1AiDLkmNMOiS1AiDLkmNMOiS1AiDLkmNMOiS1AiDLkmNMOiS1AiDLkmN6BT0JIeTnE+ymuTEkOsvSfKNJPf1ft4y+VElSZsZ+yXRSWaAk8DtwBqwnGSxqh4YWPqPVfWKbZhRktRBlzv0Q8BqVV2oqsvAGeDI9o4lSdqqLkHfBVzsO17rnRv0giT/luSjSZ45kekkSZ2N3XIBMuRcDRz/K/DUqno4yZ3AB4EDj3qiZB6YB9i7d+/WJpUkbarLHfoasKfveDdwqX9BVX2zqh7uPV4CHp9k5+ATVdWpqpqrqrnZ2dlrGFuSNKhL0JeBA0n2J9kBHAUW+xckeUqS9B4f6j3vVyc9rCRptLFbLlV1Jclx4CwwA5yuqnNJjvWuLwCvBn47yRXgO8DRqhrclpEkbaMue+g/2EZZGji30Pf4XcC7JjuaJGkrfKeoJDXCoEtSIwy6JDXCoEtSIwy6JDXCoEtSIwy6JDXCoEtSIwy6JDXCoEtSIwy6JDXCoEtSIwy6JDXCoEtSIwy6JDXCoEtSIwy6JDXCoEtSIwy6JDWiU9CTHE5yPslqkhObrPuFJI8kefXkRpQkdTE26ElmgJPAHcBB4K4kB0esuxs4O+khJUnjdblDPwSsVtWFqroMnAGODFn3u8D7ga9McD5JUkddgr4LuNh3vNY793+S7AJeBSxMbjRJ0lZ0CXqGnKuB43cAb66qRzZ9omQ+yUqSlfX19Y4jSpK6uKXDmjVgT9/xbuDSwJo54EwSgJ3AnUmuVNUH+xdV1SngFMDc3Nzg/xQkSdegS9CXgQNJ9gP/BRwFXtu/oKr2/+BxkvcCHx6MuSRpe40NelVdSXKcjVevzACnq+pckmO96+6bS9JjQJc7dKpqCVgaODc05FX1+msfS5K0Vb5TVJIaYdAlqREGXZIaYdAlqREGXZIaYdAlqREGXZIaYdAlqREGXZIaYdAlqREGXZIaYdAlqREGXZIaYdAlqREGXZIaYdAlqREGXZIaYdAlqREGXZIa0SnoSQ4nOZ9kNcmJIdePJLk/yX1JVpK8cPKjSpI2M/ZLopPMACeB24E1YDnJYlU90LfsE8BiVVWSZwF/C9y2HQNLkobrcod+CFitqgtVdRk4AxzpX1BVD1dV9Q5/DCgkSddVl6DvAi72Ha/1zv2QJK9K8u/AR4A3TGY8SVJXXYKeIecedQdeVX9XVbcBrwTeOvSJkvneHvvK+vr6lgaVJG2uS9DXgD19x7uBS6MWV9WngJ9LsnPItVNVNVdVc7Ozs1seVpI0WpegLwMHkuxPsgM4Ciz2L0jytCTpPX4usAP46qSHlSSNNvZVLlV1Jclx4CwwA5yuqnNJjvWuLwC/ArwuyfeA7wCv6fsjqSTpOhgbdICqWgKWBs4t9D2+G7h7sqNJkrbCd4pKUiMMuiQ1wqBLUiMMuiQ1wqBLUiMMuiQ1wqBLUiMMuiQ1wqBLUiMMuiQ1wqBLUiMMuiQ1wqBLUiMMuiQ1wqBLUiMMuiQ1wqBLUiMMuiQ1wqBLUiM6BT3J4STnk6wmOTHk+q8lub/38+kkz578qJKkzYwNepIZ4CRwB3AQuCvJwYFl/wm8uKqeBbwVODXpQSVJm+tyh34IWK2qC1V1GTgDHOlfUFWfrqqv9w4/C+ye7JiSpHG6BH0XcLHveK13bpTfBD56LUNJkrbulg5rMuRcDV2YvJSNoL9wxPV5YB5g7969HUeUJHXR5Q59DdjTd7wbuDS4KMmzgPcAR6rqq8OeqKpOVdVcVc3Nzs5ezbySpBG6BH0ZOJBkf5IdwFFgsX9Bkr3AB4Bfr6rPT35MSdI4Y7dcqupKkuPAWWAGOF1V55Ic611fAN4C/BTw7iQAV6pqbvvGvn72nfjItEdoyhfe9vJpjyA1q8seOlW1BCwNnFvoe/xG4I2THU2StBW+U1SSGmHQJakRBl2SGmHQJakRBl2SGmHQJakRBl2SGmHQJakRBl2SGmHQJakRBl2SGmHQJakRBl2SGmHQJakRBl2SGmHQJakRBl2SGmHQJakRBl2SGtEp6EkOJzmfZDXJiSHXb0vymSTfTfIHkx9TkjTO2C+JTjIDnARuB9aA5SSLVfVA37KvAb8HvHI7hpQkjdflDv0QsFpVF6rqMnAGONK/oKq+UlXLwPe2YUZJUgddgr4LuNh3vNY7J0l6DOkS9Aw5V1fzy5LMJ1lJsrK+vn41TyFJGqFL0NeAPX3Hu4FLV/PLqupUVc1V1dzs7OzVPIUkaYQuQV8GDiTZn2QHcBRY3N6xJElbNfZVLlV1Jclx4CwwA5yuqnNJjvWuLyR5CrAC/Djw/SRvAg5W1Te3b3RJUr+xQQeoqiVgaeDcQt/j/2ZjK0aSNCW+U1SSGmHQJakRBl2SGmHQJakRBl2SGmHQJakRBl2SGmHQJakRBl2SGmHQJakRBl2SGmHQJakRBl2SGmHQJakRBl2SGmHQJakRBl2SGmHQJakRBl2SGtEp6EkOJzmfZDXJiSHXk+Sdvev3J3nu5EeVJG1mbNCTzAAngTuAg8BdSQ4OLLsDOND7mQf+asJzSpLG6HKHfghYraoLVXUZOAMcGVhzBPjr2vBZ4MlJbp3wrJKkTdzSYc0u4GLf8Rrw/A5rdgFf7l+UZJ6NO3iAh5Oc39K02sxO4KFpDzFO7p72BJoC/21O1lNHXegS9Aw5V1exhqo6BZzq8Du1RUlWqmpu2nNIg/y3ef102XJZA/b0He8GLl3FGknSNuoS9GXgQJL9SXYAR4HFgTWLwOt6r3b5ReAbVfXlwSeSJG2fsVsuVXUlyXHgLDADnK6qc0mO9a4vAEvAncAq8D/Ab2zfyBrBrSw9Vvlv8zpJ1aO2uiVJNyDfKSpJjTDoktQIgy5JjejyOnQ9BiW5jY136O5i4zX/l4DFqnpwqoNJmhrv0G9ASd7MxkcwBPgXNl5aGuB9wz48TXosSOKr37aZr3K5ASX5PPDMqvrewPkdwLmqOjCdyaTRknypqvZOe46WueVyY/o+8LPAFwfO39q7Jk1FkvtHXQJ+5nrOcjMy6DemNwGfSPIf/P+Hou0FngYcn9ZQEhvRfhnw9YHzAT59/ce5uRj0G1BVfSzJ09n4aONdbPzHsgYsV9UjUx1ON7sPA0+sqvsGLyT55HWf5ibjHrokNcJXuUhSIwy6JDXCoOumlOSPk5zrfan5fUkGv4VLuuH4R1HddJK8AHgF8Nyq+m6SncCOKY8lXTPv0HUzuhV4qKq+C1BVD1XVpSTPS3JPknuTnE1ya5InJTmf5BkASd6X5LemOr00gq9y0U0nyROBfwKeAPwD8DdsvEb6HuBIVa0neQ3wsqp6Q5LbgT8F/gJ4fVUdntLo0qbcctFNp6oeTvI84EXAS9kI+p8BPw98PAlsfDvXl3vrP57kV4GTwLOnMrTUgXfouukleTXwO8CPVNULhlx/HBt37/uBO6tq1NvbpalyD103nSTPSNL/AWbPAR4EZnt/MCXJ45M8s3f993vX7wJOJ3n89ZxX6so7dN10etstfwk8GbjCxpebzwO7gXcCT2JjO/IdbNyZfwg4VFXfSvLnwLeq6k+u/+TS5gy6JDXCLRdJaoRBl6RGGHRJaoRBl6RGGHRJaoRBl6RGGHRJaoRBl6RG/C9Qa6RX+HrKbQAAAABJRU5ErkJggg==",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "# 分析sex性别与生存的关系（0：male；1：female）  \n",
    "# 结果可视化——性别生存率\n",
    "train_data.groupby('Sex')['Survived'].mean().plot.bar()\n",
    "# groupby（）一列聚合，按指定数据进行分类\n",
    "# 绘制直方图"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "结论：女性生存几率最大。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<AxesSubplot:xlabel='Age_group'>"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXQAAAErCAYAAADOu3hxAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8QVMy6AAAACXBIWXMAAAsTAAALEwEAmpwYAAAVmUlEQVR4nO3df7RdZX3n8feHAO3SqiC5KoZoMh2EooUaL7/UDkirBWkLKm1DbdWpyMJZVKdT15jOWNTl2IGZ6Vr1B5pmOjjVrpkMVXFSiAbHaumItAk/ZPghriyKkoEugzCAaBuC3/nj7ODhcnLvuZdzcjgP79dad+XsZz93329Obj73uc/ez96pKiRJ02+/SRcgSRoNA12SGmGgS1IjDHRJaoSBLkmNMNAlqRH7D9MpyanAh4BlwJ9U1YUD+pwM/BFwAHBPVZ003zGXL19eq1atWly1kvQUd+21195TVTOD9i0Y6EmWARcDrwZ2AFuTbKqqW/r6HAR8DDi1qr6d5DkLHXfVqlVs27ZtyL+CJAkgybf2tm+YKZfjgO1VdXtV7QI2AmfM6fPrwGer6tsAVfWdpRYrSVqaYQJ9BXBn3/aOrq3fi4CDk3wlybVJ3jToQEnOTbItybadO3curWJJ0kDDBHoGtM29X8D+wMuA04FfAH4/yYse90lVG6pqtqpmZ2YGTgFJkpZomJOiO4CVfduHAXcN6HNPVT0EPJTkKuAY4JsjqVKStKBhRuhbgcOTrE5yILAW2DSnz/8EfjbJ/kmeBhwP3DraUiVJ81lwhF5Vu5OcD2yhd9niJVV1c5Lzuv3rq+rWJF8AbgR+SO/SxpvGWbgk6bEyqdvnzs7OlpctStLiJLm2qmYH7XOlqCQ1wkCXpEYMtfR/Wqxad8WkSxjKHReePukSJDXIEbokNcJAl6RGGOiS1AgDXZIaYaBLUiMMdElqhIEuSY0w0CWpEQa6JDXCQJekRhjoktQIA12SGmGgS1IjDHRJaoSBLkmNMNAlqREGuiQ1wkCXpEYY6JLUCANdkhphoEtSIwx0SWqEgS5JjTDQJakRQwV6klOT3JZke5J1A/afnOT+JDd0HxeMvlRJ0nz2X6hDkmXAxcCrgR3A1iSbquqWOV3/uqp+cQw1SpKGMMwI/Thge1XdXlW7gI3AGeMtS5K0WMME+grgzr7tHV3bXCcm+XqSzyd58aADJTk3ybYk23bu3LmEciVJezNMoGdAW83Zvg54YVUdA3wE+NygA1XVhqqararZmZmZRRUqSZrfMIG+A1jZt30YcFd/h6p6oKq+173eDByQZPnIqpQkLWiYQN8KHJ5kdZIDgbXApv4OSZ6XJN3r47rjfnfUxUqS9m7Bq1yqaneS84EtwDLgkqq6Ocl53f71wFnA25PsBn4ArK2qudMykqQxWjDQ4dFplM1z2tb3vf4o8NHRliZJWgxXikpSIwx0SWqEgS5JjTDQJakRBrokNcJAl6RGGOiS1AgDXZIaYaBLUiMMdElqhIEuSY0w0CWpEQa6JDXCQJekRhjoktQIA12SGmGgS1IjDHRJaoSBLkmNMNAlqRFDPSRaT02r1l0x6RKGcseFp0+6BOlJwRG6JDXCQJekRhjoktQIA12SGmGgS1IjDHRJasRQgZ7k1CS3JdmeZN08/Y5N8kiSs0ZXoiRpGAsGepJlwMXAacBRwNlJjtpLv4uALaMuUpK0sGFG6McB26vq9qraBWwEzhjQ77eBzwDfGWF9kqQhDRPoK4A7+7Z3dG2PSrICeB2wfr4DJTk3ybYk23bu3LnYWiVJ8xgm0DOgreZs/xHw7qp6ZL4DVdWGqpqtqtmZmZkhS5QkDWOYe7nsAFb2bR8G3DWnzyywMQnAcuC1SXZX1edGUaQkaWHDBPpW4PAkq4H/C6wFfr2/Q1Wt3vM6yX8FLjfMJWnfWjDQq2p3kvPpXb2yDLikqm5Ocl63f955c0nSvjHU7XOrajOweU7bwCCvqrc88bIkSYvlSlFJaoSBLkmNMNAlqREGuiQ1wkCXpEYY6JLUCANdkhox1HXokp64VeuumHQJC7rjwtMnXYKeAEfoktQIA12SGmGgS1IjDHRJaoSBLkmNMNAlqREGuiQ1wkCXpEYY6JLUCANdkhphoEtSIwx0SWqEgS5JjTDQJakRBrokNcJAl6RGGOiS1AgDXZIaMVSgJzk1yW1JtidZN2D/GUluTHJDkm1JXjn6UiVJ81nwmaJJlgEXA68GdgBbk2yqqlv6un0J2FRVleRo4FLgyHEULEkabJgR+nHA9qq6vap2ARuBM/o7VNX3qqq6zacDhSRpnxom0FcAd/Zt7+jaHiPJ65J8A7gC+K1BB0pybjcls23nzp1LqVeStBfDBHoGtD1uBF5Vl1XVkcCZwAcGHaiqNlTVbFXNzszMLKpQSdL8hgn0HcDKvu3DgLv21rmqrgJ+MsnyJ1ibJGkRhgn0rcDhSVYnORBYC2zq75DknyZJ93oNcCDw3VEXK0nauwWvcqmq3UnOB7YAy4BLqurmJOd1+9cDbwDelORh4AfAr/WdJJUk7QMLBjpAVW0GNs9pW9/3+iLgotGWJklaDFeKSlIjDHRJaoSBLkmNMNAlqREGuiQ1wkCXpEYY6JLUCANdkhphoEtSIwx0SWqEgS5JjTDQJakRBrokNcJAl6RGGOiS1AgDXZIaYaBLUiMMdElqhIEuSY0w0CWpEQa6JDXCQJekRhjoktQIA12SGmGgS1IjDHRJaoSBLkmNGCrQk5ya5LYk25OsG7D/jUlu7D6uTnLM6EuVJM1nwUBPsgy4GDgNOAo4O8lRc7r9HXBSVR0NfADYMOpCJUnzG2aEfhywvapur6pdwEbgjP4OVXV1Vd3XbV4DHDbaMiVJCxkm0FcAd/Zt7+ja9uatwOcH7UhybpJtSbbt3Llz+ColSQsaJtAzoK0GdkxeRS/Q3z1of1VtqKrZqpqdmZkZvkpJ0oL2H6LPDmBl3/ZhwF1zOyU5GvgT4LSq+u5oypMkDWuYEfpW4PAkq5McCKwFNvV3SPIC4LPAb1bVN0dfpiRpIQuO0Ktqd5LzgS3AMuCSqro5yXnd/vXABcAhwMeSAOyuqtnxlS1JmmuYKReqajOweU7b+r7X5wDnjLY0SdJiuFJUkhphoEtSIwx0SWqEgS5JjTDQJakRBrokNcJAl6RGGOiS1AgDXZIaYaBLUiMMdElqhIEuSY0w0CWpEQa6JDXCQJekRhjoktQIA12SGmGgS1IjDHRJaoSBLkmNMNAlqRH7T7oASVqsVeuumHQJQ7njwtP36ddzhC5JjTDQJakRBrokNcJAl6RGGOiS1IihAj3JqUluS7I9yboB+49M8rUk/5jkXaMvU5K0kAUvW0yyDLgYeDWwA9iaZFNV3dLX7V7gHcCZ4yhSkrSwYUboxwHbq+r2qtoFbATO6O9QVd+pqq3Aw2OoUZI0hGECfQVwZ9/2jq5t0ZKcm2Rbkm07d+5cyiEkSXsxTKBnQFst5YtV1Yaqmq2q2ZmZmaUcQpK0F8ME+g5gZd/2YcBd4ylHkrRUwwT6VuDwJKuTHAisBTaNtyxJ0mIteJVLVe1Ocj6wBVgGXFJVNyc5r9u/PsnzgG3AM4EfJvmXwFFV9cD4Spck9RvqbotVtRnYPKdtfd/rv6c3FSNJmhBXikpSIwx0SWqEgS5JjTDQJakRBrokNcJAl6RGGOiS1AgDXZIaYaBLUiMMdElqhIEuSY0w0CWpEQa6JDXCQJekRhjoktQIA12SGmGgS1IjDHRJaoSBLkmNMNAlqREGuiQ1wkCXpEYY6JLUCANdkhphoEtSIwx0SWqEgS5JjRgq0JOcmuS2JNuTrBuwP0k+3O2/Mcma0ZcqSZrPgoGeZBlwMXAacBRwdpKj5nQ7DTi8+zgX+PiI65QkLWCYEfpxwPaqur2qdgEbgTPm9DkD+GT1XAMclOTQEdcqSZrH/kP0WQHc2be9Azh+iD4rgLv7OyU5l94IHuB7SW5bVLWTsRy4Z5QHzEWjPNrU8f0cHd/L0ZqW9/OFe9sxTKBnQFstoQ9VtQHYMMTXfNJIsq2qZiddRyt8P0fH93K0Wng/h5ly2QGs7Ns+DLhrCX0kSWM0TKBvBQ5PsjrJgcBaYNOcPpuAN3VXu5wA3F9Vd889kCRpfBaccqmq3UnOB7YAy4BLqurmJOd1+9cDm4HXAtuB7wP/fHwl73NTNUU0BXw/R8f3crSm/v1M1eOmuiVJU8iVopLUCANdkhphoEtSIwx0jU2S/ZK8fNJ1tCTJrwzTpoUlWZbkdyZdxyh5UrRPkmfPt7+q7t1XtbQiydeq6sRJ19GKJNdV1ZqF2jScJF+pqpMnXceoDLNS9KnkWnorXAO8ALive30Q8G1g9cQqm15XJnkD8Nly9LBkSU6jd2nwiiQf7tv1TGD3ZKpqwleTfBT4H8BDexqr6rrJlbR0jtAHSLIe2FRVm7vt04Cfr6rfnWxl0yfJg8DTgUeAH9D7AVlV9cyJFjZlkhwDvBR4P3BB364HgS9X1X0TKWzKJfnygOaqqlP2eTEjYKAPkOTaqnrZnLapv8+Dpl+S/avKEbkGcsplsHuSvAf4M3pTML8BfHeyJU2nJAHeCKyuqg8kWQkcWlV/O+HSpkqS/0N3w7veW/pYVXX0vq6pBUmeC/wB8PyqOq171sOJVfVfJlzakjhCH6A7Ofpe4J91TVcB7/ek6OIl+TjwQ+CUqvqpJAcDV1bVsRMubaok2estUwGq6lv7qpaWJPk88Ang31bVMUn2B66vqp+ecGlL4gh9gC643znpOhpxfFWtSXI9QFXd193kTYtgYI/N8qq6NMnvwaP3rnpk0kUtlYHeJ8lfMOA+7ntU1S/vw3Ja8XD3GMM90wUz9EbsWoLuJPOe79EDgQOAhzzJvGQPJTmEH31/ngDcP9mSls5Af6z/NOkCGvRh4DLgOUk+CJwFvGeyJU2vqnpG/3aSM+k9JlJL86/o3f77J5N8FZih9z06lZxD19glORL4OXqXLH6pqm6dcElNSXJNVZ0w6TqmVTdvfgS978/bqurhCZe0ZAZ6nySXVtWv9l9R0M8rCRZvL6tvH5zm/zSTlOT1fZv7AbPASa7GXZw57+PjVNVn91Uto+SUy2PtORH6ixOtoi3X0Xs8Yf+q27uTfAd4W1VdO8HaptEv9b3eDdwBnDGZUqbanvfxOcDLgb/stl8FfAWYykB3hK6x6lbdXlZVW7rt1wCnApcCH6qq4ydZn57aklxOb2Bxd7d9KHBxVc07gn+yMtAH6H4du4jeT+/gcvUlG7TCdk9bkhuq6mcmVNpUSfIR5r8C6x37sJxmJLmpql7St70fcGN/2zRxymWw/wD8kifvRuLeJO8GNnbbvwbc113K6OWLw9vW/fkK4Ch6N5MC+BV6N5XT0nwlyRbgv9P7gbkWGHR/l6ngCH2AJF+tqldMuo4WJFlOb9XtK+n9pvO/6d1g6n7gBVW1fYLlTZ3uZlKv2XNSOckB9FbevmqylU2v7jfyn+02r6qqyyZZzxNhoPfpO/N9EvA84HPAP+7ZP61nvtWOJLfRu9fIvd32wcA1VXXEZCvTk4FTLo+158x3Ad8HXtO3r5jSM9+T1K0M/dfAi4Ef39M+rbcnfRK4ELi+77avJwHvm1w506lvxW147LmJqT5f5gh9gCR/Cryzqv5ft30w8IdV9VsTLWwKJbmS3nzvu4DzgDcDO6vq3RMtbIoleT7wm8CtwNOAu6rqqslWNb2S/AyPnXL5+gTLeUJ8puhgR+8Jc+jdUIrewwW0eId0tyJ9uKr+qvuh6KrGJUpyDrAFWAf8Dr07Bb5vkjVNsyTvAD4FLKe37P9TSX57slUtnYE+2H7dqBx4dLWj01NLs2dF6N1JTk/yUuCwSRY05d4JHAt8qzsR+lJg52RLmmrnACdU1Xur6gLgROBtE65pyQypwf4QuDrJp+nNr/0q8MHJljS1/l2SZwG/C3yE3jMwm3rS+j72D1X1D0lI8mNV9Y0knhBdutB7POIej3RtU8lAH6CqPplkG3AKvX/c11fVLRMuaypV1eXdy/vpLavWE7MjyUH0rsD6YpL7gLsmWtF0+wTwN0n2XKp4JjCVTysCT4pqzLqrXN4GrKJvAOEJ5icuyUnAs4AvVNWuSdczrZKs4UfrJK6qqusnXNKSGegaqyRXA39NbzXjo7/aVtVnJlaU1CgDXWPl/VqkfcerXDRulyd57aSLkJ4KHKFrrLoVeU8HdnUfU70ST3oyM9AlqRFOuWis0vMbSX6/216ZxIcaS2PgCF1jleTj9O57fkpV/VS3AvfKqjp2wqVJzXFhkcbt+Kpak+R66N0XJ8mBky5KapFTLhq3h7unExU8utDIJxVJY2Cga9w+DFwGPDfJB+k9segPJluS1Cbn0DV2SY4Efo7eJYtf8lmt0ng4Qte+sBz4flV9FLgnyepJFyS1yBG6xirJe4FZ4IiqelH3tJ0/9yHc0ug5Qte4vQ74ZeAhgKq6C3jGRCuSGmWga9x2Ve/XwD1XuTx9wvVIzTLQNW6XJvlj4KAkbwP+F/CfJ1yT1CQXFmncZoBPAw8ARwAXAD8/0YqkRnlSVGOV5LqqWjOn7caqOnpSNUmtcoSusUjyduBfAP8kyY19u54BfHUyVUltc4SusUjyLOBg4N8D6/p2PVhV906mKqltBrokNcKrXCSpEQa6JDXCQJekRhjomjpJXpekurs4SuoY6JpGZ9O7r/raff2Fk3ipr560DHRNlSQ/AbwCeCtdoCfZL8nHktyc5PIkm5Oc1e17WZK/SnJtki1JDp3n2McmuTHJ15L8xyQ3de1vSfLnSf4CuDLJs5N8rut7TZKju37vS/KuvuPdlGRV9/GNJH/afc6nkzxtfO+SnqoMdE2bM4EvVNU3gXuTrAFeD6wCfho4BzgRIMkBwEeAs6rqZcAlwAfnOfYngPOq6kTgkTn7TgTeXFWnAO8Hru9Wu/4b4JND1H0EsKH7nAfoLbqSRspA17Q5G9jYvd7Ybb+S3j3Wf1hVfw98udt/BPAS4ItJbgDeAxw26KBJDgKeUVVXd03/bU6XL/YtiHol8CmAqvpL4JBuIdV87qyqPStk/6w7hjRSzgdqaiQ5BDgFeEmSAvY8fPqyvX0KcHM34l7w8Avsf2iBvgXs5rGDpB+fs39uf2mkHKFrmpwFfLKqXlhVq6pqJfB3wD3AG7q59OcCJ3f9bwNmkjw6BZPkxYMOXFX3AQ8mOaFrmu+E61XAG7tjngzcU1UPAHcAa7r2NUD/o/ZesKcOfnRSVxopA13T5GwePxr/DPB8YAdwE/DHwN8A91fVLno/BC5K8nXgBuDl8xz/rcCGJF+jNwq/fy/93gfMdjcduxB4c18tz+6md94OfLPvc24F3tx9zrOBjy/wd5UWzXu5qAlJfqKqvtdNy/wt8IpuPn3Rx+herwMOrap3jqC2VcDlVfWSJ3osaT7OoasVl3cnNg8EPrDYMO+cnuT36P2/+BbwltGVJ42fI3Q95SS5mN617P0+VFWfmEQ90qgY6JLUCE+KSlIjDHRJaoSBLkmNMNAlqRH/H6b7FBhZdajRAAAAAElFTkSuQmCC",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "# 分析age年龄与生存的关系\n",
    "# 由于年龄分布范围较大，对年龄进行分组处理\n",
    "bins = [0,12,18,65,100]\n",
    "labels = ['child','teenager','adult','older']\n",
    "# 将乘客按年龄分为儿童，青少年，成年人和老人四个组类\n",
    "train_data['Age_group'] = pd.cut(train_data['Age'],bins = bins,labels = labels) # pd.cut（）分组\n",
    "by_age = train_data.groupby('Age_group')['Survived'].mean()\n",
    "by_age.plot(kind = 'bar')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "结论：儿童的生存几率最大，老人的生存几率最小。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<AxesSubplot:xlabel='Embarked', ylabel='count'>"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYUAAAEGCAYAAACKB4k+AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8QVMy6AAAACXBIWXMAAAsTAAALEwEAmpwYAAAYzUlEQVR4nO3df5BV9Z3m8fdjg2CJv4DGIA3pToKpQMDO2pDJOlpEJ0LYLJisQFMbJasp3IhTpCo7WzJbUcxu71qJSdYymglZHTAhtJ0xLoRJmDhM1PLHit0OKjQykOBISy80mCFiVpT2s3/cw/EGLs1t6HNPt/28qm7dc773+z3nc73aj+e3IgIzMzOAM/IuwMzM+g+HgpmZpRwKZmaWciiYmVnKoWBmZqkheRdwOkaPHh21tbV5l2FmNqC0tbXtj4jqUp8N6FCora2ltbU17zLMzAYUSf98os+8+8jMzFIOBTMzSzkUzMwsNaCPKZiZ9bV33nmHjo4O3nrrrbxLOW3Dhw+npqaGoUOHlj3GoWBmVqSjo4NzzjmH2tpaJOVdzimLCA4cOEBHRwd1dXVlj/PuIzOzIm+99RajRo0a0IEAIIlRo0b1eovHoWBmdoyBHghHncr3cCiYmVnKoWBmVoampiYmT57M1KlTqa+v59lnnz3tZa5bt44777yzD6qDESNG9MlyBs2B5kv/4sG8S+i1tm9dn3cJZgY888wzrF+/nueff55hw4axf/9+3n777bLGHjlyhCFDSv+pnTNnDnPmzOnLUk+btxTMzE6is7OT0aNHM2zYMABGjx7NRRddRG1tLfv37wegtbWVGTNmALB8+XIWL17M1VdfzfXXX88nP/lJtm7dmi5vxowZtLW1sXLlSm655RYOHjxIbW0t7777LgB/+MMfGD9+PO+88w6/+c1vmDVrFpdeeimXX345L7/8MgC7du3iU5/6FNOmTePrX/96n31Xh4KZ2UlcffXV7N69m4svvpibb76Zxx9//KRj2traWLt2LT/5yU9obGykpaUFKATMnj17uPTSS9O+5513Hpdcckm63J///OfMnDmToUOHsnjxYu655x7a2tq46667uPnmmwFYunQpX/nKV3juuef4wAc+0Gff1aFgZnYSI0aMoK2tjRUrVlBdXc2CBQtYuXJlj2PmzJnDWWedBcD8+fP56U9/CkBLSwvz5s07rv+CBQt46KGHAGhubmbBggUcOnSIp59+mnnz5lFfX89NN91EZ2cnAE899RQLFy4E4Lrrruurrzp4jimYmZ2OqqoqZsyYwYwZM5gyZQqrVq1iyJAh6S6fY68HOPvss9PpcePGMWrUKF588UUeeughfvCDHxy3/Dlz5rBs2TJef/112trauPLKK3nzzTc5//zz2bx5c8masjh11lsKZmYnsX37dnbs2JHOb968mQ9+8IPU1tbS1tYGwMMPP9zjMhobG/nmN7/JwYMHmTJlynGfjxgxgunTp7N06VI+97nPUVVVxbnnnktdXV26lRERvPDCCwBcdtllNDc3A7B69eo++Z7gUDAzO6lDhw6xaNEiJk2axNSpU2lvb2f58uXcfvvtLF26lMsvv5yqqqoel3HttdfS3NzM/PnzT9hnwYIF/PjHP2bBggVp2+rVq7n//vu55JJLmDx5MmvXrgXg7rvv5t5772XatGkcPHiwb74ooIjos4VVWkNDQ5T7kB2fkmpm5di2bRsf+9jH8i6jz5T6PpLaIqKhVH9vKZiZWcqhYGZmqcxDQVKVpH+UtD6ZHynpUUk7kvcLivouk7RT0nZJM7OuzczM/lglthSWAtuK5m8FNkbERGBjMo+kSUAjMBmYBdwnqecjN2Zm1qcyDQVJNcC/Af5XUfNcYFUyvQq4pqi9OSIOR8QuYCcwPcv6zMzsj2W9pfA/gf8MvFvUdmFEdAIk72OS9nHA7qJ+HUnbH5G0WFKrpNaurq5MijYzG6wyu6JZ0ueAfRHRJmlGOUNKtB13vmxErABWQOGU1NOp0czsVPT1Ke7lnn6+YcMGli5dSnd3N1/+8pe59dZb+7QOyHZL4TJgjqRXgGbgSkk/BvZKGguQvO9L+ncA44vG1wB7MqzPzGzA6O7uZsmSJfzyl7+kvb2dNWvW0N7e3ufrySwUImJZRNRERC2FA8j/EBFfBNYBi5Jui4C1yfQ6oFHSMEl1wERgU1b1mZkNJJs2beIjH/kIH/rQhzjzzDNpbGxMr27uS3lcp3An8BlJO4DPJPNExFagBWgHNgBLIqI7h/rMzPqd1157jfHj39uZUlNTw2uvvdbn66nIXVIj4jHgsWT6AHDVCfo1AU2VqMnMbCApdUsi3yXVzGyQqqmpYffu907Q7Ojo4KKLLurz9TgUzMwGgGnTprFjxw527drF22+/TXNzcybPd/ZDdszMeimPOxgPGTKE733ve8ycOZPu7m5uuOEGJk+e3Pfr6fMlmplZJmbPns3s2bMzXYd3H5mZWcqhYGZmKYeCmZmlHApmZpZyKJiZWcqhYGZmKZ+SambWS69+Y0qfLm/CbS+dtM8NN9zA+vXrGTNmDFu2bOnT9RfzloKZ2QDwpS99iQ0bNmS+HoeCmdkAcMUVVzBy5MjM1+NQMDOzlEPBzMxSmYWCpOGSNkl6QdJWSXck7cslvSZpc/KaXTRmmaSdkrZLmplVbWZmVlqWZx8dBq6MiEOShgJPSvpl8tl3I+Ku4s6SJlF4bOdk4CLg7yVd7KevmZlVTmahEIXHBB1KZocmr+MfHfSeuUBzRBwGdknaCUwHnsmqRjOzU1HOKaR9beHChTz22GPs37+fmpoa7rjjDm688cY+X0+m1ylIqgLagI8A90bEs5I+C9wi6XqgFfhaRPwOGAf8n6LhHUnbsctcDCwGmDBhQpblm5n1G2vWrKnIejI90BwR3RFRD9QA0yV9HPg+8GGgHugEvp10L/Ww0eO2LCJiRUQ0RERDdXV1JnWbmQ1WFTn7KCL+BXgMmBURe5OweBf4IYVdRFDYMhhfNKwG2FOJ+szMrCDLs4+qJZ2fTJ8F/BnwsqSxRd0+Dxy9Xnsd0ChpmKQ6YCKwKav6zMxOpHBIdOA7le+R5TGFscCq5LjCGUBLRKyX9CNJ9RR2Db0C3AQQEVsltQDtwBFgic88MrNKGz58OAcOHGDUqFFIpfZqDwwRwYEDBxg+fHivxmV59tGLwCdKtF/Xw5gmoCmrmszMTqampoaOjg66urryLuW0DR8+nJqaml6N8V1SzcyKDB06lLq6urzLyI1vc2FmZimHgpmZpRwKZmaWciiYmVnKoWBmZimHgpmZpRwKZmaWciiYmVnKoWBmZimHgpmZpRwKZmaWciiYmVnKoWBmZimHgpmZpbJ88tpwSZskvSBpq6Q7kvaRkh6VtCN5v6BozDJJOyVtlzQzq9rMzKy0LLcUDgNXRsQlQD0wS9KfALcCGyNiIrAxmUfSJKARmAzMAu5LntpmZmYVklkoRMGhZHZo8gpgLrAqaV8FXJNMzwWaI+JwROwCdgLTs6rPzMyOl+kxBUlVkjYD+4BHI+JZ4MKI6ARI3sck3ccBu4uGdyRtxy5zsaRWSa3vh8flmZn1J5mGQkR0R0Q9UANMl/TxHrqXekJ2lFjmiohoiIiG6urqPqrUzMygQmcfRcS/AI9ROFawV9JYgOR9X9KtAxhfNKwG2FOJ+szMrCDLs4+qJZ2fTJ8F/BnwMrAOWJR0WwSsTabXAY2ShkmqAyYCm7Kqz8zMjjckw2WPBVYlZxCdAbRExHpJzwAtkm4EXgXmAUTEVkktQDtwBFgSEd0Z1mdmZsfILBQi4kXgEyXaDwBXnWBME9CUVU1mZtYzX9FsZmYph4KZmaUcCmZmlnIomJlZyqFgZmYph4KZmaUcCmZmlnIomJlZyqFgZmYph4KZmaUcCmZmlnIomJlZyqFgZmYph4KZmaUcCmZmlsryyWvjJf1a0jZJWyUtTdqXS3pN0ubkNbtozDJJOyVtlzQzq9rMzKy0LJ+8dgT4WkQ8L+kcoE3So8ln342Iu4o7S5oENAKTgYuAv5d0sZ++ZmZWOZltKUREZ0Q8n0y/AWwDxvUwZC7QHBGHI2IXsBOYnlV9ZmZ2vIocU5BUS+HRnM8mTbdIelHSA5IuSNrGAbuLhnXQc4iYmVkfyzwUJI0AHga+GhG/B74PfBioBzqBbx/tWmJ4lFjeYkmtklq7urqyKdrMbJAqKxQkbSynrUSfoRQCYXVE/AwgIvZGRHdEvAv8kPd2EXUA44uG1wB7jl1mRKyIiIaIaKiuri6nfDMzK1OPoSBpuKSRwGhJF0gambxqKRwM7mmsgPuBbRHxnaL2sUXdPg9sSabXAY2ShkmqAyYCm3r9jczM7JSd7Oyjm4CvUgiANt7bxfN74N6TjL0MuA54SdLmpO0vgYWS6insGnolWQcRsVVSC9BO4cylJT7zyMyssnoMhYi4G7hb0p9HxD29WXBEPEnp4wS/6GFME9DUm/WYmVnfKes6hYi4R9K/BmqLx0TEgxnVZWZmOSgrFCT9iMIZQ5uBo7t0AnAomJm9j5R7RXMDMCkijjtF1MzM3j/KvU5hC/CBLAsxM7P8lbulMBpol7QJOHy0MSLmZFKVmZnlotxQWJ5lEWZm1j+Ue/bR41kXYmZm+Sv37KM3eO8+RGcCQ4E3I+LcrAozM7PKK3dL4ZzieUnX4Ntam5m975zSXVIj4n8DV/ZtKWZmlrdydx99oWj2DArXLfiahYy9+o0peZfQaxNueynvEszsNJR79tG/LZo+QuFGdnP7vBozM8tVuccU/kPWhZiZWf7KfchOjaRHJO2TtFfSw5Jqsi7OzMwqq9wDzX9N4SE4F1F4bvLPkzYzM3sfKTcUqiPiryPiSPJaCfhZmGZm7zPlhsJ+SV+UVJW8vggc6GmApPGSfi1pm6StkpYm7SMlPSppR/J+QdGYZZJ2Stouaeapfy0zMzsV5YbCDcB84P8CncC1wMkOPh8BvhYRHwP+BFgiaRJwK7AxIiYCG5N5ks8agcnALOA+SVW9+zpmZnY6yg2F/wosiojqiBhDISSW9zQgIjoj4vlk+g1gG4XjEXOBVUm3VcA1yfRcoDkiDkfELmAnvmrazKyiyg2FqRHxu6MzEfE68IlyVyKpNun/LHBhRHQmy+kExiTdxgG7i4Z1JG3HLmuxpFZJrV1dXeWWYGZmZSg3FM44Zt//SMq/GnoE8DDw1Yj4fU9dS7Qdd9V0RKyIiIaIaKiu9rFuM7O+VO4Vzd8Gnpb0NxT+UM8Hmk42SNJQCoGwOiJ+ljTvlTQ2IjoljQX2Je0dwPii4TXAnjLrMzOzPlDWlkJEPAj8O2Av0AV8ISJ+1NMYSQLuB7ZFxHeKPloHLEqmFwFri9obJQ2TVAdMBDaV+0XMzOz0lbulQES0A+29WPZlwHXAS5I2J21/CdwJtEi6EXgVmJcsf6uklmQdR4AlEdHdi/WZmdlpKjsUeisinqT0cQKAq04wpokydkuZmVk2Tul5CmZm9v7kUDAzs5RDwczMUg4FMzNLORTMzCzlUDAzs5RDwczMUg4FMzNLORTMzCzlUDAzs5RDwczMUg4FMzNLORTMzCzlUDAzs5RDwczMUpmFgqQHJO2TtKWobbmk1yRtTl6ziz5bJmmnpO2SZmZVl5mZnViWWworgVkl2r8bEfXJ6xcAkiYBjcDkZMx9kqoyrM3MzErILBQi4gng9TK7zwWaI+JwROwCdgLTs6rNzMxKy+OYwi2SXkx2L12QtI0Ddhf16UjajiNpsaRWSa1dXV1Z12pmNqhUOhS+D3wYqAc6gW8n7aWe5RylFhARKyKiISIaqqurMynSzGywqmgoRMTeiOiOiHeBH/LeLqIOYHxR1xpgTyVrMzOzCoeCpLFFs58Hjp6ZtA5olDRMUh0wEdhUydrMzAyGZLVgSWuAGcBoSR3A7cAMSfUUdg29AtwEEBFbJbUA7cARYElEdGdVm5mZlZZZKETEwhLN9/fQvwloyqoeMzM7OV/RbGZmKYeCmZmlHApmZpZyKJiZWcqhYGZmKYeCmZmlHApmZpZyKJiZWcqhYGZmKYeCmZmlHApmZpZyKJiZWcqhYGZmKYeCmZmlHApmZpbKLBQkPSBpn6QtRW0jJT0qaUfyfkHRZ8sk7ZS0XdLMrOoyM7MTy3JLYSUw65i2W4GNETER2JjMI2kS0AhMTsbcJ6kqw9rMzKyEzEIhIp4AXj+meS6wKpleBVxT1N4cEYcjYhewE5ieVW1mZlZapY8pXBgRnQDJ+5ikfRywu6hfR9J2HEmLJbVKau3q6sq0WDOzwaa/HGhWibYo1TEiVkREQ0Q0VFdXZ1yWmdngUulQ2CtpLEDyvi9p7wDGF/WrAfZUuDYzs0Gv0qGwDliUTC8C1ha1N0oaJqkOmAhsqnBtZmaD3pCsFixpDTADGC2pA7gduBNokXQj8CowDyAitkpqAdqBI8CSiOjOqjazSnj1G1PyLqFXJtz2Ut4lWD+QWShExMITfHTVCfo3AU1Z1WNmZifXXw40m5lZP+BQMDOzlEPBzMxSDgUzM0s5FMzMLOVQMDOzVGanpJr1pUv/4sG8S+i1R87JuwKz3vOWgpmZpRwKZmaWciiYmVnKoWBmZimHgpmZpRwKZmaWciiYmVnKoWBmZqlcLl6T9ArwBtANHImIBkkjgYeAWuAVYH5E/C6P+szMBqs8txQ+HRH1EdGQzN8KbIyIicDGZN7MzCqoP+0+mgusSqZXAdfkV4qZ2eCUVygE8CtJbZIWJ20XRkQnQPI+JqfazMwGrbxuiHdZROyRNAZ4VNLL5Q5MQmQxwIQJE7Kqz8xsUMolFCJiT/K+T9IjwHRgr6SxEdEpaSyw7wRjVwArABoaGqJSNZvZqRtod7lt+9b1eZeQm4rvPpJ0tqRzjk4DVwNbgHXAoqTbImBtpWszMxvs8thSuBB4RNLR9f8kIjZIeg5okXQj8CowL4fazMwGtYqHQkT8FrikRPsB4KpK12NmZu/pT6ekmplZzhwKZmaWciiYmVnKoWBmZqm8Ll4zM+u3Xv3GlLxL6LUJt73UJ8vxloKZmaUcCmZmlnIomJlZyqFgZmYph4KZmaUcCmZmlnIomJlZyqFgZmYph4KZmaUcCmZmlnIomJlZqt+FgqRZkrZL2inp1rzrMTMbTPpVKEiqAu4FPgtMAhZKmpRvVWZmg0e/CgVgOrAzIn4bEW8DzcDcnGsyMxs0FBF515CSdC0wKyK+nMxfB3wyIm4p6rMYWJzMfhTYXvFCK2c0sD/vIuyU+fcbuN7vv90HI6K61Af97XkKKtH2R6kVESuAFZUpJ1+SWiOiIe867NT49xu4BvNv1992H3UA44vma4A9OdViZjbo9LdQeA6YKKlO0plAI7Au55rMzAaNfrX7KCKOSLoF+DugCnggIrbmXFaeBsVusvcx/34D16D97frVgWYzM8tXf9t9ZGZmOXIomJlZyqHQT/l2HwOXpAck7ZO0Je9arHckjZf0a0nbJG2VtDTvmirNxxT6oeR2H/8EfIbCabrPAQsjoj3Xwqwskq4ADgEPRsTH867HyidpLDA2Ip6XdA7QBlwzmP7b85ZC/+TbfQxgEfEE8HredVjvRURnRDyfTL8BbAPG5VtVZTkU+qdxwO6i+Q4G2b+YZnmTVAt8Ang251IqyqHQP530dh9mlh1JI4CHga9GxO/zrqeSHAr9k2/3YZYTSUMpBMLqiPhZ3vVUmkOhf/LtPsxyIEnA/cC2iPhO3vXkwaHQD0XEEeDo7T62AS2D/HYfA4qkNcAzwEcldUi6Me+arGyXAdcBV0ranLxm511UJfmUVDMzS3lLwczMUg4FMzNLORTMzCzlUDAzs5RDwczMUg4FG7QkdReddri5N3ejlTRD0vrTXP9jkk7p4fCSVkq69nTWb1ZKv3ocp1mF/b+IqM9jxcmdcM36HW8pmB1D0iuS/rukZyS1SvpXkv5O0m8k/ceirudKekRSu6S/knRGMv77ybitku44Zrm3SXoSmFfUfoakVZL+m6QqSd+S9JykFyXdlPSRpO8l6/pbYEyF/nHYIOMtBRvMzpK0uWj+f0TEQ8n07oj4lKTvAispXOk6HNgK/FXSZzowCfhnYAPwBeBvgP8SEa8nWwMbJU2NiBeTMW9FxJ8CJAEzBFgNbImIJkmLgYMRMU3SMOApSb+icLfOjwJTgAuBduCBPv7nYeZQsEGtp91HR+819RIwIrm3/huS3pJ0fvLZpoj4LaS3tvhTCqEwP/njPgQYSyE4jobC0dA56gcUbmPSlMxfDUwtOl5wHjARuAJYExHdwB5J/3AqX9jsZLz7yKy0w8n7u0XTR+eP/s/UsfeICUl1wH8CroqIqcDfUtjCOOrNY8Y8DXxa0tE+Av48IuqTV11E/OoE6zPrcw4Fs1M3PbmT7RnAAuBJ4FwKf/gPSroQ+OxJlnE/8Avgp5KGULgJ4leS2zcj6WJJZwNPAI3JMYexwKez+Uo22Hn3kQ1mxx5T2BARZZ+WSuFOqHdS2M//BPBIRLwr6R8pHHv4LfDUyRYSEd+RdB7wI+DfA7XA88ltnLuAa4BHgCsp7M76J+DxXtRpVjbfJdXMzFLefWRmZimHgpmZpRwKZmaWciiYmVnKoWBmZimHgpmZpRwKZmaW+v/M9Q0KlRacBAAAAABJRU5ErkJggg==",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "#  分析Embarked登陆港口与生存的关系\n",
    "#  S港口用0表示；C港口用1表示；Q港口用2表示；\n",
    "sns.countplot(x = 'Embarked',hue = 'Survived',data = train_data)\n",
    "# sns.countplot（）以bar直方图的形式展示每个类别的数量"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "结论：港口亦会影响生存几率。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 4：建立模型"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* 线性回归模型\n",
    "* 随机森林模型"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "case 1：线性回归"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 线性回归LinearRegression模型\n",
    "from sklearn.linear_model import LinearRegression\n",
    "from sklearn.model_selection import KFold\n",
    "# 交叉验证:将训练数据集分成3份，对这三份进行交叉验证，比如使用1，2样本测试，3号样本验证，对最后得到得数据取平均值\n",
    "# 选取特征\n",
    "predictors = [\"Pclass\",\"Sex\",\"Age\",\"SibSp\",\"Parch\",\"Fare\",\"Embarked\"]\n",
    "# 建立线性回归LinearRegression模型\n",
    "LR = LinearRegression()\n",
    "# n_splits代表将数据切分成3份，存在3层的交叉验证\n",
    "kf = KFold(n_splits = 3,shuffle = False)\n",
    "predictions = []\n",
    "for train_index,test_index in kf.split(train_data):\n",
    "    # iloc通过索引获取数据\n",
    "    train_predictors = train_data[predictors].iloc[train_index,:]\n",
    "    # 获取对应的标签值\n",
    "    train_target = train_data['Survived'].iloc[train_index]\n",
    "    # 进行训练\n",
    "    LR.fit(train_predictors,train_target)\n",
    "    # 进行预测\n",
    "    test_predictions = LR.predict(train_data[predictors].iloc[test_index,:])\n",
    "    # 加入预测结果\n",
    "    predictions.append(test_predictions)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "accuracy: 0.7833894500561167\n"
     ]
    }
   ],
   "source": [
    "# 验证模型的精度\n",
    "# 逻辑回归输出为概率值，需要对其结果进行转化\n",
    "predictions = np.concatenate(predictions,axis = 0)\n",
    "predictions[predictions > 0.5] = 1\n",
    "predictions[predictions <= 0.5] = 0\n",
    "# 模型准确度\n",
    "accuracy = sum(predictions == train_data['Survived'])/len(predictions)\n",
    "print('accuracy:',accuracy)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "准确率是0.7833894500561167。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "case 2：随机森林"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "accuracy: 0.8103254769921437\n"
     ]
    }
   ],
   "source": [
    "# 随机森林RandomForestClassifier模型\n",
    "from sklearn.ensemble import RandomForestClassifier\n",
    "# 使用交通事故理赔审核预测中所训练好最佳参数配比模型\n",
    "Ir4 = RandomForestClassifier(random_state=200 ,n_estimators = 69,max_depth = 13,min_samples_leaf =1 ,min_samples_split =2 )\n",
    "Ir4.fit(train_data[predictors],train_data['Survived'])\n",
    "score = cross_val_score(Ir4,train_data[predictors],train_data['Survived'],cv=3)\n",
    "# 逻辑回归和线性回归得到的结果类型不一样，逻辑回归是概率值，线性回归是[0,1]区间的数值\n",
    "# 输出模型的精度\n",
    "print('accuracy:', score.mean())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "准确率提升。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "看了别人的博客，发现随机森林模型的确会更加精确。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 总结与体会"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "由题知，本次报告主要面临着三个问题 1.数据库的缺失，所以首先要对数据库进行扩容。 2.数据的转换。比如把文字信息改写成数学符号信息 3.模型的训练与数据预测。通过参考了网上的一些博客，我明白了如何建立两种预测模型（线性回归、随机森林）。刚拿到数据库的时候，我面对庞大的数组感觉无从下手，但是后来参看了别人分类数据的手段，我尝试了多种途径进行数据分类，然后思考如何把缺失的数据补全。最后选择一些我认为有说服力的指标进行分析，并输入到模型里面进行预测。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "python的确很强大，几十行就训练出一个模型。"
   ]
  }
 ],
 "metadata": {
  "interpreter": {
   "hash": "7d990ae127390bdb9fa9be493db589a953879e6aa6f077cb1d6f5050e6ee2420"
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
